Skip to content

Conversation

@LDong-Arm
Copy link
Contributor

@LDong-Arm LDong-Arm commented Mar 29, 2021

Summary of changes

Fixes: #14128, #14157

The PR does the following:

  • In Clean-up: TDBStore no longer requires BlockDevice to have flash behaviour #14007 we removed the use of the shim layer FlashSimBlockDevice from TDBStore tests and passed the real HeapBlockDevice directly to TDBStore, because flash simulation was no longer needed after TDBStore refactoring #11987. This caused TDBStore tests to fail (instead of skip properly) on low-RAM targets, because HeapBlockDevice::erase() does not trigger any heap allocations, unlike FlashSimBlockDevice::erase() which we used before.
    To fix this issue, use HeapBlockDevice::program() instead which allocates heap memory on the fly for blocks being programmed. If this fails, skip the test for not having enough heap.
  • A few quality-of-life improvements:
    • HeapBlockDevice: Use new (std::nothrow) for all allocations, instead of a mixture of malloc and unchecked new (exceptions are not enabled by Mbed OS).
    • HeapBlockDevice::erase(): Deallocate memory of erased blocks.
    • FileSystemStore unit test: add missing reformat step after erase.
    • Optimize TDBStore's erase operations. It also always calls BlockDevice::erase() regardless of erase value, as erase-before-program is required by BlockDevice's documentation.

Impact of changes

Migration actions required

Documentation

None.


Pull request type

[x] Patch update (Bug fix / Target update / Docs update / Test update / Refactor)
[] Feature update (New feature / Functionality change / New API)
[] Major update (Breaking change E.g. Return code change / API behaviour change)

Test results

[] No Tests required for this change (E.g docs only update)
[x] Covered by existing mbed-os tests (Greentea or Unittest)
[x] Tests / results supplied as part of this PR

Manual testing on K64F:

  • Full RAM used: TDBStore tests run and pass, as before
  • RAM size reduced from 0x30000 to 0x6000 in the linker file (to emulate a low-RAM target): TDBStore tests are skipped correctly.

Reviewers

@ARMmbed/mbed-os-core @geky @davidsaada



/** Program blocks to a block device
*
* The blocks must have been erased prior to being programmed
Copy link
Contributor Author

@LDong-Arm LDong-Arm Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this because it's definitely not true - the heap buffer is only allocated when a block is programmed, not when it's erased.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read this as being about checks the class method will do to enforce that the block is erased before being programmed; policy you must follow as a programmer. I don't see code to check this, but maybe there ought to be some code here if we want to ensure programs written to use the HeapBlockDevice will continue to work when used with actual flash-based devices.

I checked the implementation of is_valid_program(), but it doesn't appear to check that a block was erased before it can be programmed. If this policy is enforced anywhere, I'm not clear on where yet.

Copy link
Contributor Author

@LDong-Arm LDong-Arm Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think it's a real inconsistency in our existing API.

Mbed OS allows a BlockDevice to be either flash-based or not.

  • For flash-based ones (e.g. SPIFBlockDevice), I think the actual erase requirement is set by the hardware (if I understand correctly).
  • For non-flash-based ones, "erase" should in theory be unsupported. One example is SD. From its documentation

    Because an SD card doesn't need preparation, erase is not applicable on SDBlockDevice; it's a no-op operation.

So there's no single policy for all BlockDevices.

HeapBlockDevice has always been quite self-contradicting:

  • It was intended to be non-flash-based. This was why we used to use a flash simulation layer FlashSimBlockDevice on top of it, to simulate erase operations when used for TDBStore which used to required erase (but not anymore).
  • However, a partially implemented & partially lacking HeapBlockDevice::erase() always existed - it performs size/alignment check, but doesn't do the actual erase.
  • And this line of Doxygen we are talking about suggests erase is required before program.

In my opinion, the proper solution for best consistency is remove ::erase and any mentions of "erase" from HeapBlockDevice. This would change the API for users who do like to use ::erase.

Any preference?

Copy link
Contributor Author

@LDong-Arm LDong-Arm Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the SD example I used above, its Doxygen contradicts with its documentation.
Documentation:

Because an SD card doesn't need preparation, erase is not applicable on SDBlockDevice; it's a no-op operation.

Doxygen:

* @note The blocks must be erased prior to programming

So HeapBlockDevice is just another one.

Copy link
Contributor

@Patater Patater Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update SDBlockDevice too, please?

If we remove erase from HeapBlockDevice, it might no longer be swappable for other BlockDevice implementations. HeapBlockDevice::erase() may need to be a no-op for compatibility reasons (with perhaps things like LittleFS layered on top). Also, if HeapBlockDevice::program() allocates memory, what should free the allocated memory? In an earlier iteration of this PR, you had HeapBlockDevice::erase() free memory. If HeapBlockDevice::erase() is instead a no-op, should we move the freeing of allocated memory to HeapBlockDevice::~HeapBlockDevice() instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base class's virtual BlockDevice::erase() is already a no-op. The freeing of allocated memory is also done in HeapBlockDevice::~HeapBlockDevice() already. So removing HeapBlockDevice::erase() and updating Doxygen should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following discussions below, I kept HeapBlockDevice::erase() because it doesn't break compatibility/consistency with the definition of the BlockDevice API.

@ciarmcom ciarmcom added the release-type: patch Indentifies a PR as containing just a patch label Mar 29, 2021
@ciarmcom ciarmcom requested review from a team, davidsaada and geky March 29, 2021 14:30
@ciarmcom
Copy link
Member

@LDong-Arm, thank you for your changes.
@geky @davidsaada @ARMmbed/mbed-os-maintainers please review.

@LDong-Arm LDong-Arm force-pushed the tdbstore_whitebox_low_ram branch 3 times, most recently from c3604a5 to 055168f Compare March 30, 2021 12:02
@LDong-Arm LDong-Arm changed the title Fix TDBStore tests heap checks, improve HeapBlockDevice Improve HeapBlockDevice, TDBStore and tests Mar 30, 2021
@kjbracey
Copy link
Contributor

Looking at this, it seems existing and new TDBStore code is inconsistent with BlockDevice documentation.

TDBStore is taking get_erase_value() being -1 as an indicator "you don't need to erase". That's not what the BlockDevice says - it says value is indeterminate after erase, but it still says you need to erase before programming.

Am I missing something?

Presumably all of this works now, suggesting we have no "erase-needed-with-indeterminate-erase-value" devices, suggesting maybe just BlockDevice doxygen could be modified to drop that possibility.

@LDong-Arm
Copy link
Contributor Author

@kjbracey-arm Thanks, it's a really good point. It looks to me that there's quite a lot of ambiguity around flash vs non-flash BlockDevices in our existing code: how we tell whether erase is supported or not, why some classes don't support erase but have erase sizes nonetheless, etc.

I'll look in more details and try to align things.

@kjbracey
Copy link
Contributor

kjbracey commented Mar 30, 2021

why some classes don't support erase

According to the BlockDevice API as written, erase can certainly be a no-op, as long as the erase_value is -1 so that the caller doesn't expect any particular result from the erase.

But it appears the user contract is that users of the API must always call erase on a location before programming it. Which means there must still be an "erase size", even if erase is a no-op. And it appears BlockDevice has a default implementation which makes that the program size, which is sensible.

So I think the BlockDevice API is consistent, but TDBStore doesn't quite fit to it.

I think I'm mildly in favour of making TDBStore follow BlockDevice docs, rather than changing those docs, but I don't know how complex that would be. It may well not be worth the effort.

If you did want to fix TDBStore, you'd want to make a test-bed BlockDevice that did require erase but had erase_value -1. I guess we don't currently have one.

@LDong-Arm
Copy link
Contributor Author

LDong-Arm commented Mar 31, 2021

@kjbracey-arm Thanks for the explanation. For block devices that's don't actually support erase, it makes sense for them to inherit the default get_erase_size(), get_erase_value(), erase() from the base BlockDevice. They ensure API consistency (i.e. the "user contract" you mentioned).

Looking at this, it seems existing and new TDBStore code is inconsistent with BlockDevice documentation.

TDBStore is taking get_erase_value() being -1 as an indicator "you don't need to erase". That's not what the BlockDevice says - it says value is indeterminate after erase, but it still says you need to erase before programming.

I agree erase value being -1 doesn't necessarily mean erase not supported, by definition. In my opinion, users of the BlockDevice API should not care about the underlying type of device - just call erase() in all cases which is no-op if erase is unsupported. I'll update TDBStore to reflect this.

@LDong-Arm LDong-Arm force-pushed the tdbstore_whitebox_low_ram branch from 055168f to 2a56915 Compare March 31, 2021 11:47
@LDong-Arm LDong-Arm force-pushed the tdbstore_whitebox_low_ram branch from 2a56915 to 4da2392 Compare March 31, 2021 14:49
@LDong-Arm
Copy link
Contributor Author

Please keep review on hold. I'm still seeing a number of issues, trying to debug them...

@LDong-Arm LDong-Arm force-pushed the tdbstore_whitebox_low_ram branch from 4da2392 to e4dd420 Compare April 1, 2021 13:11
Comment on lines 76 to 78
if (fs->mount(&heap) != MBED_SUCCESS) {
EXPECT_EQ(fs->reformat(&heap), MBED_SUCCESS);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Remove this test case as per #12652 (comment)

@LDong-Arm LDong-Arm force-pushed the tdbstore_whitebox_low_ram branch 2 times, most recently from 62f0fe6 to 22bb52d Compare April 6, 2021 15:16
@LDong-Arm
Copy link
Contributor Author

Finally ready for review again, after solving multiple test failures caused by the previous iterations of my changes.

kjbracey
kjbracey previously approved these changes Apr 9, 2021
@mergify mergify bot added needs: CI and removed needs: review labels Apr 9, 2021
@adbridge
Copy link
Contributor

CI started

@LDong-Arm
Copy link
Contributor Author

There's still a todo: #14483 (comment)
I'll update it.

@LDong-Arm LDong-Arm changed the title Improve HeapBlockDevice, TDBStore and tests DO NOT MERGE: Improve HeapBlockDevice, TDBStore and tests Apr 14, 2021
@mbed-ci
Copy link

mbed-ci commented Apr 14, 2021

Jenkins CI Test : ✔️ SUCCESS

Build Number: 1 | 🔒 Jenkins CI Job | 🌐 Logs & Artifacts

CLICK for Detailed Summary

jobs Status
jenkins-ci/mbed-os-ci_unittests ✔️
jenkins-ci/mbed-os-ci_build-greentea-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_cmake-example-ARM ✔️
jenkins-ci/mbed-os-ci_cmake-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_build-greentea-ARM ✔️
jenkins-ci/mbed-os-ci_cmake-cloud-example-ARM ✔️
jenkins-ci/mbed-os-ci_cmake-cloud-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_build-example-ARM ✔️
jenkins-ci/mbed-os-ci_build-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_build-cloud-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_build-cloud-example-ARM ✔️
jenkins-ci/mbed-os-ci_greentea-test ✔️
jenkins-ci/mbed-os-ci_cmake-example-test ✔️

Mbed OS does not enable C++ exceptions, so we should call `new` with
`std::nothrow` which returns a C-style NULL pointer when allocation
fails to allow error handling.

For consistency of style within the same file, this commit also
replaces `malloc()` and `free()` to `new (std::nothrow)` and `delete`.
`HeapBlockDevice::erase()` previously performed a range and alignment
check only. This commit adds freeing of heap memory.
@LDong-Arm LDong-Arm force-pushed the tdbstore_whitebox_low_ram branch from 22bb52d to 7400281 Compare April 14, 2021 16:12
@LDong-Arm LDong-Arm changed the title DO NOT MERGE: Improve HeapBlockDevice, TDBStore and tests Improve HeapBlockDevice, TDBStore and tests Apr 14, 2021
@mergify mergify bot dismissed kjbracey’s stale review April 14, 2021 16:13

Pull request has been modified.

@LDong-Arm
Copy link
Contributor Author

LDong-Arm commented Apr 14, 2021

Now ready again. I just rebased this PR, and did this commit: e807051 as agreed by the test suite's original author: #12652 (comment)

The "erased_set_get" test case deinits `FileSystemStore`, erases the
block device, and reinits `FileSystemStore`. This of course fails,
because `BlockDevice::erase()` removes all stored data including the
format of the `FileSystem` (middle layer), unless the particular type
of block device has a no-op erase implementation.

Note: Previously `HeapBlockDevice::erase()` was essentially a no-op
so this test case did not fail. We recently added the freeing of heap
memory and it uncovered the problem.
Currently `TDBStore::offset_in_erase_unit()` and
`TDBStore::check_erase_before_write()` loop through erase units
one-by-one, until the entire range is covered. This is very inefficient
when the erase size is tiny, e.g. one-byte on a non-flash device for
which we use program as erase.

This commit reworks the algorithms, based on the fact that a block
device can erase or program as many units as needed in one go.
From the documentations of `BlockDevice::get_erase_value()`:

    -1 if you can't rely on the value of the erased storage

and `BlockDevice::program()`:

    The blocks must have been erased prior to being programmed

So, `BlockDevice::erase()` should always be called regardless of
erase value.
Each block of HeapBlockDevice is only allocated from the heap when
that block is programmed. And erasing a block frees the associated
buffer.

To decide if there is enough heap to run the TDBStore Whitebox tests,
we need to perform a trial program() instead of erase().
@LDong-Arm LDong-Arm force-pushed the tdbstore_whitebox_low_ram branch from 7400281 to 7b763be Compare April 14, 2021 16:19
@0xc0170
Copy link
Contributor

0xc0170 commented Apr 14, 2021

CI started

@mbed-ci
Copy link

mbed-ci commented Apr 14, 2021

Jenkins CI Test : ✔️ SUCCESS

Build Number: 2 | 🔒 Jenkins CI Job | 🌐 Logs & Artifacts

CLICK for Detailed Summary

jobs Status
jenkins-ci/mbed-os-ci_unittests ✔️
jenkins-ci/mbed-os-ci_build-greentea-ARM ✔️
jenkins-ci/mbed-os-ci_cmake-example-ARM ✔️
jenkins-ci/mbed-os-ci_cmake-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_cmake-cloud-example-ARM ✔️
jenkins-ci/mbed-os-ci_build-cloud-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_build-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_build-cloud-example-ARM ✔️
jenkins-ci/mbed-os-ci_build-greentea-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_cmake-cloud-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_build-example-ARM ✔️
jenkins-ci/mbed-os-ci_cmake-example-test ✔️
jenkins-ci/mbed-os-ci_greentea-test ✔️

@0xc0170 0xc0170 merged commit b5652a4 into ARMmbed:master Apr 15, 2021
@mergify mergify bot removed the ready for merge label Apr 15, 2021
@mbedmain mbedmain removed release-type: patch Indentifies a PR as containing just a patch Release-pending labels Apr 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

storage-kvstore-tdbstore-tests-TESTS-tdbstore-whitebox test fails on Toshiba Target

8 participants